Source https://cedricscherer.netlify.app/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/
chic <- readr::read_csv("https://raw.githubusercontent.com/Z3tt/R-Tutorials/master/ggplot2/chicago-nmmaps.csv")##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## city = col_character(),
## date = col_date(format = ""),
## death = col_double(),
## temp = col_double(),
## dewpoint = col_double(),
## pm10 = col_double(),
## o3 = col_double(),
## time = col_double(),
## season = col_character(),
## year = col_double()
## )
## Rows: 1,461
## Columns: 10
## $ city <chr> "chic", "chic", "chic", "chic", "chic", "chic", "chic", "chi…
## $ date <date> 1997-01-01, 1997-01-02, 1997-01-03, 1997-01-04, 1997-01-05,…
## $ death <dbl> 137, 123, 127, 146, 102, 127, 116, 118, 148, 121, 110, 127, …
## $ temp <dbl> 36.0, 45.0, 40.0, 51.5, 27.0, 17.0, 16.0, 19.0, 26.0, 16.0, …
## $ dewpoint <dbl> 37.500, 47.250, 38.000, 45.500, 11.250, 5.750, 7.000, 17.750…
## $ pm10 <dbl> 13.052268, 41.948600, 27.041751, 25.072573, 15.343121, 9.364…
## $ o3 <dbl> 5.659256, 5.525417, 6.288548, 7.537758, 20.760798, 14.940874…
## $ time <dbl> 3654, 3655, 3656, 3657, 3658, 3659, 3660, 3661, 3662, 3663, …
## $ season <chr> "Winter", "Winter", "Winter", "Winter", "Winter", "Winter", …
## $ year <dbl> 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, …
## # A tibble: 10 x 10
## city date death temp dewpoint pm10 o3 time season year
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 chic 1997-01-01 137 36 37.5 13.1 5.66 3654 Winter 1997
## 2 chic 1997-01-02 123 45 47.2 41.9 5.53 3655 Winter 1997
## 3 chic 1997-01-03 127 40 38 27.0 6.29 3656 Winter 1997
## 4 chic 1997-01-04 146 51.5 45.5 25.1 7.54 3657 Winter 1997
## 5 chic 1997-01-05 102 27 11.2 15.3 20.8 3658 Winter 1997
## 6 chic 1997-01-06 127 17 5.75 9.36 14.9 3659 Winter 1997
## 7 chic 1997-01-07 116 16 7 20.2 11.9 3660 Winter 1997
## 8 chic 1997-01-08 118 19 17.8 33.1 8.68 3661 Winter 1997
## 9 chic 1997-01-09 148 26 24 12.1 13.4 3662 Winter 1997
## 10 chic 1997-01-10 121 16 5.38 24.8 10.4 3663 Winter 1997
##library(ggplot2)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.4 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
We specify the data outside aes() and add the variables that ggplot maps the aesthetics to inside aes()
geom_line() to create a line plot (not optimal though):Within the geom_* , you can manipulate visual aesthetics such as the color, shape, and size of your points
Each geom comes with its own properties (called arguments) and the same argument may result in a different change depending on the geom you are using.
g + geom_point(color = "firebrick", shape = "diamond", size = 2) +
geom_line(color = "firebrick", linetype = "dotted", size = .3)And to illustrate some more of ggplot’s versatility, let’s get rid of the grayish default {ggplot2} look by setting a different built-in theme, e.g. theme_bw() —by calling theme_set() all following plots will have the same black’n’white theme. The red points look way better now!
theme() is an essential command to manually modify all kinds of theme elements (texts, rectangles, and lines).
the labs() command provides a character string for each label we want to change (here x and y):
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (ºF)")** 💁 You can also add each axis title via xlab() and ylab()** Example:
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
xlab("Year") +
ylab("Temperature (ºF")The code below also allows to add not only symbols but e.g. superscripts (with the use of ^):
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = expression(paste("Temperature (", degree ~ F, ")"^"(Hey, why should we use metric units?!)")))We can change the properties of all or particular text elements (here axis titles) by overwriting the default element_text() within the theme() call:
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
theme(axis.title.x = element_text(vjust = 0, size = 15),
axis.title.y = element_text(vjust = 2, size = 15))the vjust command refers to the vertical alignment, which usually ranges between 0 and 1, but you can also specify values outside that range
vjust (which is correct form the label’s perspective)
but you can also change the distance by specifying the margin of both text elements:
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
theme(axis.title.x = element_text(margin = margin(t = 10), size = 15),
axis.title.y = element_text(margin = margin(r = 10), size = 15))The labels t and r within the margin() object refer to top and right
💡 A good way to remember the order of the margin sides is “t-r-oub-l-e”.
Within the element_text() we can for example overwrite the defaults for size, color, and face:
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (ºF)") +
theme(axis.title = element_text(size = 15, color = "firebrick", face = "italic"))The face argument can be used to make the font bold or italic or even bold.italic.
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
theme(axis.title.x = element_text(color = "sienna", size = 15),
axis.title.y = element_text(color = "orangered", size = 15))💁 You could also use a combination of axis.title and axis.title.y, since axis.title.x inherits the values from axis.title. Example:
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (ºF)") +
theme(axis.title = element_text(color = "sienna", size = 15),
axis.title.y = element_text(color = "orangered", size = 15))One can modify some properties for both axis titles and other only for one or properties for each on its own:
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (ºF)") +
theme(axis.title = element_text(color = "sienna", size = 15, face = "bold"), axis.title.y = element_text(face = "bold.italic"))You can also change the appearance of the axis text (here the numbers) by using axis.textand/or the subordinated elements axis.text.x and axis.text.y:
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
theme(axis.text = element_text(color = "dodgerblue", size = 12),axis.text.x = element_text(face = "italic")) ## Rotate Axis Text
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
theme(axis.text.x = element_text(angle = 50, vjust = 1, hjust = 1, size = 12))ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
theme(axis.ticks.y = element_blank(),
axis.text.y = element_blank())
element_blank() removes the element (and thus is not considered an official element)
💡 Use it if you want to get rid of a theme element
We could again use theme_blank() but it is way simpler to just remove the label in the labs() (or xlab()) call:
💡 Note that NULL removes the element (similarly to element_blank()) while empty quotes "" will keep the spacing for the axis title and simply print nothing.
You can ZOOM IN instead of subsetting your data:
ggplot(chic, aes(x = date, y = temp)) +
geom_point(color = "firebrick") +
labs(x = "Year", y = "Temperature (°F)") +
ylim(c(0, 50))## Warning: Removed 777 rows containing missing values (geom_point).
You can also use scale_y_continuous(limits = c(0, 50)) or coord_cartesian(ylim = c(0, 50)). The former removes all data points outside the range while the second adjusts the visible area and is similar to ylim(c(0, 50)). You may wonder: So in the end both result in the same. But not really, there is an important difference—compare the two following plots:
You might have spotted that on the left there is some empty buffer around your y limits while on the right points are plotted right up to the border and even beyond. This perfectly illustrates the subsetting (left) versus the zooming (right). To show why this is important let’s have a look at a different chart type, a box plot:
… Because scale_x|y_continuous() subsets the data first, we get completely different (and wrong, at least if in the case this was not your aim) estimates for the box plots! I hope you don’t have to go back to your old scripts now and check if you maybe have manipulated your data while plotting and did report wrong summary stats in your report, paper or thesis…
You can also force R to plot the graph starting at the origin:
library(tidyverse)
chic_high <- dplyr::filter(chic, temp > 25, o3 > 20)
ggplot(chic_high, aes(x = temp, y = o3)) +
geom_point(color = "darkcyan") +
labs(x = "Temperature higher than 25°F",
y = "Ozone higher than 20 ppb") +
expand_limits(x = 0, y = 0) 💁 Using
coord_cartesian(xlim = c(0, NA), ylim = c(0, NA)) will lead to the same result:
library(tidyverse)
chic_high <- dplyr::filter(chic, temp > 25, o3 > 20)
ggplot(chic_high, aes(x = temp, y = o3)) +
geom_point(color = "darkcyan") +
labs(x = "Temperature higher than 25°F",
y = "Ozone higher than 20 ppb") +
coord_cartesian(xlim = c(0, NA), ylim = c(0, NA))But again, we can force it to literally start plotting from the origin
ggplot(chic_high, aes(x = temp, y = o3)) +
geom_point(color = "darkcyan") +
labs(x = "Temperature higher than 25°F",
y = "Ozone higher than 20 ppb") +
expand_limits(x = 0, y = 0) +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0)) +
coord_cartesian(clip = "off") 💡 The argument clip = “off” in any coordinate system, always starting with coord_*, allows to draw outside of the panel area.
Here, let’s plot temperature against temperature with some random noise.
The coord_equal() is a coordinate system with a specified ratio representing the number of units on the y-axis equivalent to one unit on the x-axis. The default, ratio = 1, ensures that one unit on the x-axis is the same length as one unit on the y-axis:
ggplot(chic, aes(x = temp, y = temp + rnorm(nrow(chic), sd = 20))) +
geom_point(color = "sienna") +
labs(x = "Temperature (°F)", y = "Temperature (°F) + random noise") +
xlim(c(0, 100)) + ylim(c(0, 150)) +
coord_fixed()## Warning: Removed 48 rows containing missing values (geom_point).